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Abstract. Recent statistical and computational analyses have shown that 
a genealogic al most recent common ancestor (MRCA) may have lived in the 
recent past [Chanel . Il999l . iRohde et all Eo04| . However, coalescent-based ap- 
proaches show that genetic most recent common a ncestors for a giv en non- 
recombining locus are typically much more ancient |KingmanLll982allf3 |. It is 
not immediately clear how these two perspectives interact. This paper inves- 
tigates relationships between the number of descendant alleles of an ancestor 
allele and the number of genealogical descendants of the individual who pos- 
sessed th at allele for a simple diploid genetic model extending the genealogical 
model of lChanel |l99S|l . 



1. Introduction and model 

Joseph Chang's 1999 paper [Chanel . LT999] showed that a well-mixed closed diploic 
population of n individuals will have a genealogical common ancestor in the recent 
past. Specifically, the paper showed that if T„ is the number of generations back 
to the most recent common ancestor (MRCA) of the population, then T n divided 
by log 2 n converges to one in probability as n goes to infinity. His paper initiated 
a discussion in which many of the leading figures of population genetics expressed 
interest in th e relationship betwee n the genealogical and genetic perspectives for 



such models [Donnelly et al. . 19991 ]. For example, Peter Donnelly wrote "[rjesults 



on the extent to which common ancestors, in the sense of [Chang's! paper, are 
ances tors in the genetic sense... would also be of great interest" [Donnelly et all 
1999]. Every other discussant also either discussed the relationship of Chang's work 
to genetics or expressed interest in doing so. 

Given this interest, surprisingly little work has been done specifically about 
the interplay between the two perspectives. Wiuf and Hei n, in their reply, wrot e 



three paragraphs containing some simple initial observa tions [Donnelly et al. . 1999]. 



Some simulation work has been done bv lMurphv 20041 with a more realist ic popu- 
lation model. In a related though different vein, iMohle and Sagitovl 2003| derived 
limiting results for the diploid coalescent, in the classical setting of a small sample 
from a large population. 

In an intere s ting series of p apers, Derrida, Manrub ia, Zanette, and collaborators 
Derrida et all Il999l . l2000al lbl iManrubia et~aTl l2003j have investigated the distri- 



bution of the number of repetitions of ancestors in a genealogical tree, as well as 
the degree of concordance between the genealogical trees for two distinct individ- 
uals. Our paper, on the other hand, is concerned with correlations between the 
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number of genealogical descendants of an individual and the number of descendant 
alleles of that individual. The interesting time-frame in our paper is different than 
theirs: they focus on the period substantially after T n , while for us any interesting 
correlation is erased with high probability after time about X.77T n . 

Our paper attempts to connect the genealogical and genetic points of view by 
investigating several different questions concerning the interaction of genealogical 
ancestry and genetic ancestry in a diploid model incorporating Chang's model. 
In classical Wright-Fisher fashion, we consider 2n alleles contained in n diploid 
individuals. Each discrete generation forward in time, every individual selects two 
alleles from the previous generation independently and uniformly to "inherit." If 
an individual X at time t inherits genetic information from an individual Y at 
time t — 1, then we consider Y to be a "parent" of X in the genealogical sense. 
As with Chang's model, the two parents are permitted to be the same individual 
and each allele of a child may descend from the same parent allele. We illustrate 
the basic operation of the model in Figure [1] Each individual is represented as a 
circle, and each of a given individual's alleles are represented as dots within the 
circle. Time increases down the figure and inheritance of alleles is represented by 
lines connecting them. 




Figure 1 . An example instance of our model with four individuals 
and three generations. Time increases moving down the diagram. 
The two alleles of each individual are depicted as two dots within 
the larger circles; a thin black line indicates genetic inheritance, i.e. 
the lower allele is descended from the upper allele. This sample 
genealogy demonstrates that the genealogical MRCA need not have 
any genetic relation to present-day individuals. The individual at 
the far right on the top row is in this case the (unique) MRCA as 
demonstrated by the thick gray lines, however none of its genetic 
material is passed onto the present day. 

We have chosen notation in order to fit with Chang's original article. The initial 
generation will be denoted t = and other generations will be counted forwards in 
time; thus the parents of the t = 1 generation will be in the t = generation, and so 
on. The n individuals of generation t will be denoted I tt i, . . . , It. n - The two alleles 
present at a given locus of individual I t j will be labeled A tyi ^ and A tt i,2- Using this 
notation, each allele A t .i, c of generation t selects an allele A t -i.j^d uniformly and 
independently from all of the alleles of the previous generation; given such a choice 
we say that allele A tj i jC is descended genetically from allele At—ij,d- We define more 
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distant ancestry recursively: allele At,i, c is descended from allele Afj,d if t > t' and 
there exists a k and e such that allele At.i. c is descended genetically from allele 
At—i,k,e and allele A t -\ t k,e is descended from or is the same as allele Afj d- 

One can make a similar recursive definition of genealogical ancestry that matches 
Chang's notion of ancestry: individual l t .i is descended genealogically from individ- 
ual It' ,j if t > t' and there exists a k such that individual I^i is a parent of individual 
h-i,k and individual It—i,k is descended from or is the same as individual It 1 ,j- 

Define Q\ to be the alleles that are genetic descendants at time t of the two 
alleles present in individual Io,i, and let Q\ be the number of such alleles. We will 
call the elements of Q\ the descendant alleles of individual Iqa- Define Q\ to be the 
genealogical descendants at time t of the individual Jo,i , and let G\ be the number 
of such individuals. We will say that a (genealogical) most recent common ancestor 
(MRCA) first appears at time t if there is an individual Jo,i i n the population at 
time such that G\ = n and G 3 S < n for all j and s < t; that is, individual i in 
generation is a genealogical ancestor of all individuals in generation t, but there 
is no individual in generation that is a a genealogical ancestor of all individuals 
in any generation previous to generation t. Let T n denote the generation number 
at which the MRCA first appears. The main conclusion of Chang's 1999 paper is 
that the ratio T n / log 2 n converges to one in probability as n tends to infinity. 

Our intent is to investigate the degree to which genealogical ancestry implies 
genetic ancestry. Unsurprisingly, historical individuals with more genealogical de- 
scendants will have more descendant alleles in expectation: in Proposition [T] we 
show that E[Q' l t \ G\ = k] is a super-linearly increasing function in k. However, in 
any realization of the stochastic process, individuals with more genealogical descen- 
dants need not have more descendant alleles. For example, in Figure [1] we show a 
case where the MRCA has no genetic relationship to any present day individuals. 
In the above notation, G| = n = 4 and yet Q\ = 0. 

Another approach is based on the rank of G\ . Loosely speaking, we are interested 
in the number of descendant alleles of the generation-i individual with the arth most 
genealogical descendants. More rigorously, we consider the renumbering (opposite 
to the way rank is typically defined in statistics) F(t, 1), . . . , F(t, n) of the indices 
1 , . . . , n such that 

G F (M) > • • • > G F ( *' n) 

and if G F ^ 1 ^ — G F ^^ then fix F(t,i) < F(t,j) when i < j. We then investigate 
|gF(t,fc) : i<k<n}. These quantities give us concrete information about our 
main question in a relative sense: how much do individuals with many genealogical 
descendants contribute to the genetic makeup of present-day individuals compared 
to those with only a few? In Figure [2] we simulate our process 10000 times and 
then take an average for each time step, approximating E Q^ 1 '^ . 

After several generations, the curve depicting E Q F ^' k ' acquires a character- 
istic shape which persists for some time, in this figure between time 3 and time 8. 
In order to explain what this curve is, we need to introduce some elementary facts 
about branching processes. 

Recall that a branching process is a dis crete time Markov process that tracks the 

popu lation size of an idealized population Athreva and Nev . 19721 Grimmett and Stirzaker 
2001]. Each individual of generation t produces an independent random number of 
offspring in generation t + 1 according to some fixed probability distribution (the 
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Figure 2. The expected number of descendant alleles from his- 
torical individuals sorted by number of genealogical descendants. 
Results by simulation of a population of size 200. For example, the 
value at "genealogical rank" 50 and time 2 is the expected num- 
ber of alleles in the current population which descend from one or 
other of the two alleles present in the individual two generations 
ago who had no more genealogical descendants in the present pop- 
ulation than did 49 other individuals in the population two genera- 
tions ago. As described in the text, this curve attains an interesting 
characteristic shape around generation 3 that lasts until generation 
8. We investigate that shape in Figure [3] and Proposition 



offspring distribution). This distribution is the same across all individuals. We 
will use the Poisson(2) branching process where the offspring distribution is Pois- 
son with mean 2 and write B t for the number of individuals in the t th generation 
starting with one individual at time t = 0. It is a standard fact that the random 
variables Wt — St/2* converges almost surely as t — » oo to a random variable W 
that is strictly positive on the event that the branching process doesn't die out 
(that is, on the event th at B t is strictly positive for all t > 0) - cf. Theorem 8.1 of 
Athreva and Nev . 1972j . Denote by R the distribution of the limit random variable 



W . The probability measure R is diffuse except for an atom at (that is, is the 
only point to which R assigns non-zero mass). Also, the support of R is the whole 
of R+ (that is, every open sub-interval of K.+ is assigned strictly positive mass by 
R). 

T Fit k) 

Returning to our discussion of E Q t 
7t , n (c) : (0, 1) -> R+ by 



define a non-increasing function 



7t,»(c)=E Qf (t < Lc " J ) 
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Figure 3. A plot of 76,200 and /?, showing experimentally that the 
characteristic shape in Figure [2] is very close to the "tail-quantile" 
curve of a normalized Poisson(2) branching process. The curve for 
76.200 was taken from Figure O To construct the curve for (3, we 
wrote a subroutine that simulated 200 Poisson(2) branching pro- 
cesses simultaneously, then sorted the normalized results after 10 
generations. This subroutine was run 10000 times and the aver- 
age was taken. Note that the distribution had stabilized after 10 
generations. 

and define a non-increasing, continuous function (3 : (0,1) — > R+ by 

/3(c) = mm{r > : R ((r/2, 00)) < c} 
^ = min{r > : R([Q,r/2]) > 1 - c}. 

That is, f3(c) is the (1 — c) th quantile of 2W, where the random variable W is 
the limit of the normalized Poisson(2) branching process introduced above. Note 
that the function (3 is strictly decreasing on the interval (0, 1 — R({0})); that is, 
(3(c) is the unique value r for which R((r/2, 00)) = c when < c < 1 — R({0}). 
We see experimentally that 76,200 is quite close to (3 in Figure [3l and establish 
a convergence result in Proposition [5] Although a closed-form expression for the 
distribution R is not available, ther e is a considerable amount known about this 
classical object [Van Miegheml |2005| . Note that the long-time behavior in Figure [H 
is easily explained: it is simply the uniform distribution across only the common 
ancestors, that form 1 — e~ 2 ~ 0.864 of the population. 

Thus far we have examined the connection between genealogical ancestry and 
genetic ancestry in the population as a whole; one may wonder about the number 
of descendants of the MRCA itself. Unfortunately, the story there is not as simple 
as could be desired. For example, there are usually multiple MRCAs appearing 
(by definition) in the same generation, and the expected number depends on n in 
a surprising way (see Figure [4]). We investigate this genealogical issue and related 
genetic questions in Section [4] 

2. MONOTONICITY OF THE NUMBER OF DESCENDANT ALLELES IN TERMS OF 

GENEALOGY 

In this section we prove the following result. 
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Proposition 1. For each time t > 2, the function k i— ► k 1 E[Q\ \ G\ = k], < 
k <n, is strictly increasing. 

The key observation in the proof of Proposition [T] will be that the random vari- 
ables G\ and G\ +1 enjoy the prope rty of t otal y ositivity investigat ed extensively 
in th e statistical literature following Karlin 1968} (see, for example, Brown et al 



1981]). 



Definition 1. A pair of random variables (X, Y) has a strict TP(2) joint distri- 
bution if 

¥{X =x,Y = y}¥{X =x',Y = y'} >¥{X = x,Y = y'}¥ {X = x',Y = y} 
for all x < x' and y < y' such that the left-hand side is strictly positive. 

The proof of the next result is clear. 
Lemma 1. The following are equivalent to strict TP (2) for x < x' and y < y' : 

P{Y = ?/\X = x>} F{Y = y\X = x'} 

( ' F{Y = y'\X = x} F{Y = y\X = x} 

F{X = x'\Y = y'} P{X = x\Y = y>} 

[ ' P {X = x' | Y = y} P {X = x | Y = y} ' 

Lemma 2. The pair (G\,G\ +l ) has a strict TP (2) joint distribution. 

Proof. We will show condition p]). By definition of our model, the number of 
genealogical descendants in generation t + 1 has a conditional binomial distribution 
as follows: 



F{Gl +1 = k\Gl=r} = 



(f \ (2r/n - {r/n) 2 ) h (l - 2r/n + (r/n) 2 ) 1 



Set x(r) — 2r/n — (r/n) 2 , a function that is strictly increasing in r for < r < n. 
Then 

¥{G\ +1 = k+l\G\ = r} _ n-k x(r) 
F{Gl +1 = k\G\=r} ~ jfe+1 ' 1 - x(r) ' 
a function that is a strictly increasing function of r for 1 < r < n. □ 

The following definition is well known to statisticians Lehmannl . 1986} . 

Definition 2. Consider a reference measure fi on some space X and a parame- 
terized family {pe : 9 G 0} of probability densities with respect to \i, where is a 
subset o/M. Let T be a real-valued function defined on X . The family of densities 
has the monotone likelihood ratio property in T with respect to the parameter 6 if 
for any 9' < 9" the densities pgi and pg» are distinct and x i— > pgn(x)/pgi(x) is a 
nondecreasing function ofT(x). 

Lemma 3. Fix a time t > 0. If the function f : K — > K is strictly increasing, then 
the function k E[/(GJ) | G\ +1 — k], 1 < k < n, is strictly increasing. 

Proof. By Lemma [2] and inequality ((3]), the family of probability densities (with 
respect to counting measure) P {G\ = r \ G\ +1 = fc} parameterized by k has mono- 
tone likelihood ratios in r with respect to k. Now apply Lemma 2(i) of Lehmannl . 



198(5]. □ 
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Proof of Proposition^ For t > 1, let a t (k) = k' 1 E[Qj | G^ = k). 

First note that each individual of Q\ has a 1/n chance of choosing /o,i as a parent 
twice, thus 

ai(k) = (1 + 1/n). 

The result will thus follow by induction on £ if we can show for t > 1 that the 
function fc i— > a t+ i(fe) is strictly increasing whenever the function k i— > cnt{k) is 
non-decreasing. Therefore, fix £ > 1 and suppose that the function k \— > a?t(fc) is 
non-decreasing. 

We first claim that 

(4) k- 1 E[Q* t+l | G\ = r, G* +x = k] = (r" 1 + n" 1 ) E[Q* | G* = r] = /(r), 
where 

/(r) = (r- 1 + n- 1 ) E[QJ | G\ = r] = (l + a t (r). 

The proof of this claim is as follows. 

Recall that Q\ is the set of generation t individuals descended from 7o,ij so that Q\ 
has elements. Suppose that G\—r and number the elements of Q\ as 1, • • • ,r. 
Let Vj tC be the indicator random variable for the event that the allele Atj, c is 
descended from one of the alleles of lQ,i for 1 < j < r. By definition, the sum of 
the Vj >c is equal to Q\. Note that any individual in Q\ , 1 has one parent uniformly 
selected from Q\ and the other uniformly selected from the population as a whole. 
Selections for different individuals are independent. Therefore, 

E{Ql +1 \Gl = r,G\ +1 = k,V hl ,V h2 , 



k 




e= r +i / 



= k(r-' + n-^Ql 
By the tower property of conditional expectation, 

E[Qt +1 \G\=r, G\ +1 =k] = kit- 1 + n- x )E[Q< | G\ = r, G* +1 = ft]. 
An application of the Markov property now establishes our claim Thus 

n 

fc- 1 E[Qj +1 | Gj +1 = A:] = £ E [«+i I G t = r , G l+i = fc ] - r I G m = fc ) 



f(r)F{Gl=r\Gl +1 =k} 



= E[/(Gj)|Gj +1 =fc]. 

This is strictly increasing in by Lemma [3] and the observation that / is strictly 
increasing. □ 

3. The mysterious shape in Figure [2] 

In this section we investigate the shape of the curve relating the number of 
descendant alleles to genealogical rank. As shown in Figure [2] this curve attains a 
characteristic shape after several generations; the shape is maintained for a period 
prior to the time when the genealogical MRCA appears. We show that this curve is 
essentially the limiting "tail- quant ile" of a normalized Poisson(2) branching process. 
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An important component of our analysis will be a multigraph representing an- 
cestry that we will call the genealogy. A multigraph is similar to a graph except 
that multiple edges between pairs of nodes are allowed. Specifically, a multigraph 
is an ordered pair (V, E) where V is a set of nodes and E is a multiset of unordered 
pairs of nodes. 

Definition 3. Define the time t ancestry multigraph © t as follows. The nodes of 
this multigraph are the set of all individuals of generations zero through t; for any 
< t' < t connect an It\k to It>—i,j if h'.k is descended from I t >-i.j. If both parents 
of If ,k are I t i-i,j, then add an additional edge connecting It\h and It'-ij. Define 
the time t genealogy <&\ to be the subgraph of &t consisting of Iq^ and all of its 
descendants (Jt'=o &t> U P ^° ti me t- 

Definition 4. We define an ancestry path in & t to be a sequence of individuals 
Ia,i{o),h,i(i),--- ,h,i(t) with *(°) = * where for each < t' < t, I t > -i,i{v -i) is a 
parent of If Let PI be the number of ancestry paths in & t . 

We emphasize that a parent being selected twice by a single individual results 
in a "doubled" edge; paths that differ only in their choice of what edge to traverse 
between parent to child are considered distinct. Thus, each such doubled edge 
doubles the number of ancestry paths that contain the corresponding parent-child 
pair. 

Our result concerning the connection between the curve in Figure [2] and the 
Poisson(2) branching process can be stated as follows. Define a random proba- 
bility measure on the positive quadrant that puts mass 1 jn at each of the points 
(E[Q| | ©t], 2 1 ~ t G\). We show below that this random probability measure con- 
verges in probability to a deterministic probability measure concentrated on the 
diagonal and has projections onto either axis given by the limiting distribution of 
2 1 ~ t B t as t — > oo. 

We may describe the convergence more concretely by using the idea of "sorting by 
the number of genealogical descendants" as in the introduction; using the notation 
introduced there, let the random variable F(t, k) denote the index of the individual 
in generation with the k th greatest number of genealogical descendants at time t. 
Recall the non-increasing, continuous function (3 : (0, 1) — > R+ defined in equation 



Proposition 2. Suppose that 0<a<b<l - R({0}), so that oo > (3(a) > (3(b) > 
0. Then 



converges to 1 in probability as t — t n and n go to infinity in such a way that 
2 2t "/n->0. 

Note that the condition 2 2t "/n — > is satisfied, for example, when t n = r log 2 n for 
r < 1/2. 

The proof of Proposition [5] formalizes the following three common-sense notions 
about the ancestry process. 

Note that for t > 1, the genealogy will not necessarily be a tree: it may be pos- 
sible to follow two different ancestry paths through & t to a given time-i individual. 
However, our first intuition is that this possibility is rare when n is large and t is 
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small relative to n, and such events do not affect the values of G\ and Q\ in the 
limit. 

Second, the fact that each of the above genealogies is usually a tree suggests 
that we may be able to relate the ancestry process to a branching process. In our 
case, the number of immediate descendants for an individual h'-i.j is the number 
of times a individual of generation t' chooses it'— lj as a parent. These numbers 
are not exactly independent: for example, if all of the individuals of generation 
t' descend only from a single individual of generation t' — 1, then the number of 
descendants of the other individuals is exactly zero. However, we will show that 
these numbers are close to independent when n becomes large. Also, note that 
the marginal distribution of the number of next-generation descendants of a single 
individual is binomial: there are 2n trials each with probability l/n. As n goes to 
infinity, this is approximately a Poisson(2) random variable. In summary, we will 
show that the genealogy of an individual is close to that of a Poisson(2) branching 
process for short times relative to the population size. 

Third, we note that there is a simple relationship between the number of paths 
PI and the expected number of descendant alleles Q\ : 

Lemma 4. E [Q\ \ &\] = 2 1 "*P t i . 

Proof. Consider an arbitrary path in the ancestry graph <&\ and pick an arbitrary 
edge in that path. Suppose the edge connects if-ij to I t '^. By the definition of 
the model, It',i has probability 1/2 of inheriting any fixed allele of It'-i,j. Thus, 
the contribution of any single allele of Iq^ and given path in & t to the expectation 
of Q\ is 2~*. The contribution of both alleles of io,i is 2 1 ~*. The total number 
of alleles descended from the alleles of io,i is the sum over the contributions of all 
paths, and the expectation of this sum is the sum of the expectations. □ 

We will use the probabilistic method of coupling to formalize the connection 
between the genealogical process and the branching process. A coupling of ran- 
dom variables X and Y that are not necessarily defined on the same probability 
space is a pair of random variables X' and Y' defined on a single probability space 
such that the marginal distributions of X and X' (respectively, Y' and V) are the 
same. A simple example of coupling is "Poisson thinning" , a coupling between an 
X ~ Poisson(Ai) and a Y ~ Poisson(A2) where Ai > A2. To construct the pair 
(X',Y'), one first gains a sample for X' by simply sampling from X. The sample 
from Y' is then gained by "throwing away" points from the sample for X' with 
probability A2/A1; i.e. the distribution for Y' conditioned on the value x for X' is 
just Binomial(a;, 1 — A2/A1). 

We note that coupling is a p opular tool for questions with a flavour similar to 
ours. Rece ntly iBarbourj 12007 1 has coupled an epidemics model to a branching 



process and iDurrett et alj [2007] have used coupling to analyze a model of carcino- 
genesis. 

Recall that we defined W t = B t /2 t , where B t is a Poisson(2) branching processes 
started at time t — from a single individual, and we observed that the sequence of 
random variables Wt converges almost surely to a random variable W with distri- 
bution R. The following lemma is the coupling result that will give the convergence 
of the sampling distribution of the P£ and G\ to R in Lemma [6] below. 

Lemma 5. There is a coupling between PI, G\, and B\, where B\,B^, ... is a se- 
quence of independent Poisson(2) branching processes, such that for a fixed positive 
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integer £ the probability 

P{Pi = G\ = Bl,l<i< £} 
converges to one as n goes to infinity with t = t n satisfying 2 2t ™/n — > 0. 

Proof. We introduce the coupling between the ancestral process and the branching 
process by looking first at the transition from generation to generation 1. Suppose 
that we designate a set S of k individuals in generation and write G for the number 
of descendants these k individuals have in generation 1. 

The probability that there is an individual in generation 1 who picks both of its 
parents from the k designated individuals is 



I - {l- (k/n) 2 Y < k 2 /n. 



Couple the random variable G with a random variable P that is the same as G 
except that we (potentially repeatedly) re-sample any generation 1 individual who 
chooses two parents from S until it has at least one parent not belonging to S. The 
random variable P will have a binomial distribution with number of trials n and 
success probability 

(5) " v n > = —a^., 

1- 4 1 + - 

n 2 n 

which is simply the probability of an individual selecting exactly one parent from 
the set of k given that it does not select two. By the above, 

P{G ± P} < — . 

n 



By a special case o f Le Cam's Poisson approximation result [Grimmett and Stirzaker 



2001 , Le Cam , 1960j . we can couple the random variable P to a random variable 



Y that is Poisson distributed with mean 

2^ 2k 



1+* 

n 

in such a way that 

¥{P ^Y} < n ^ + 

Moreover, a straightforward argument using Poisson thinning shows that we can 
couple the random variable Y with a random variable B that is Poisson distributed 
with mean 2k such that 




2fc 

2k - 



1 + - 



, k2 
< 2—. 

n 



Putting this all together, we see that we can couple the random variables G, P, 
and B together in such a way that 

k 2 

P(-^{G = P = B}) <8 — 
n 

where -i denotes complement. Note that B may be thought of as the sum of k 
independent random variables, each having a Poisson distribution with mean 2. 

Fix an index i with 1 < i < n. Returning to the notation used in the rest of the 
paper, the above triple (G, P, B) correspond to (G\, PI, B\), and k plays the role of 
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G\_ x . Now suppose we start with one designated individual i in the population at 
generation 0. Let St denote the event 

{PI = G\ = B\}. 

The above argument shows that we can couple the process P l with the branching 
process B % in such a way that 

P{^£ t }<P{-£ t -i} + P{-£ t) S t -i} 



<P{^ t _!} + E 

<E{^S t _i} + 



n 

c2 2(t-l) 



n 

for a suitable constant c (using standard formulae for moments of branching pro- 
cesses). Iterating this bound gives 

c'2 2 * 

P{-M < — 
n 

for a suitable constant d . 

This tells us that when n is large, the random variable P t * is close to the random 
variable B\ not just for fixed times but more generally for times t such that 2 2t /n —> 
0. As mentioned above, this condition is satisfied when t = rlog 2 n for r < 1/2. 

Next, we elaborate the above argument to handle the descendants of £ individ- 
uals. Let Sf denote the event that the I coupled triples of random variables are 
equal, that is, 

{Pl = G\ = B\ , P 2 = G 2 = 5 2 , . . . , P[ =Gi=Bi}, 

where Bl is the branching process coupled to Pl and G\. By mimicking the above 
argument, we can show that 

V{->St} < P{-^_J + ci2 2{t -^/n. 

Again, iterating this bound gets 

P{->S|} < c'l2 2t /n 

for some c'. □ 

For any Borel subset C of K?_, let rj t ^ n (C) denote the joint empirical distribution 
of the normalized P/ and the normalized G\ at time t, i.e. 

Vt,n(G) = - ■ #{1 < i < n : (2-*P t \2-*Gj) S C}. 
n 

In Lemma[6]we demonstrate that the r\ t . n converge in probability to the determin- 
istic probability measure r](dx,dy) — R{dx)5 x (dy) = 5 y (dx)R{dy) concentrated on 
the diagonal, where 5 Z denotes the unit point mass at z. 

The mode of convergence may require a bit of explanation. When we say that 
a real-valued random variable converges in probability to a fixed quantity, there is 
an implicit and commonly understood notion of convergence of a sequence of real 
numbers. However, here the random quantities are probability measures, and the 
underlying notion we use for convergence of measures is that of weak convergence. 
Recall that a sequence of probability measures n n on R 2 ^ is said to converge to /i 
weakly if J f dfi n converges to J f dfj, for all bounded continuous functions / : — > 
K. The following are equivalent conditions for sequence of probability measures \i n 
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to converge to /i weakly: (i) limsup„ fi n (F) < l^iF) f° r au closed sets F C , (ii) 
lim inf „ /i n (G) > /i(G) for all open sets G C R 2 ^, (iii) lim n /z„(^4) = fi(A) for all 
Borel sets A C R 2 ^ such that /Lt(9^4) = 0, where cM is the boundary of A. 

Lemma 6. Suppose that t — t n converges to infinity as n goes to infinity in such a 
way that linin^oo 2 2t ™/n — > 0. Then the sequence of random measures r)t,n converges 
in probability as n — > oo to the deterministic probability measure r\ on M 2 ^ that 
assigns mass R(A n B) to sets of the form A x B. 

Proof. For brevity, let HI denote the pair (2~'P t l , 2~ t G\). Fix a bounded continuous 
function / : Ri — > R. By definition, 



E 



/ dr) t . 



= n~ 2 E 



j2f(Hi)+j2f(Ht)f(H{) 

i i^ij 

= n- 2 (nE[f(Hl)] + n(n - l)E[f(H})f(H?)]) . 
Hence, E[(J / dn tt n) 2 ] is asymptotically equivalent to 

(6) nf(Hl)f{Hf)]. 
By definition, ([6]) is equal to 

(7) E[/(2- t P t 1 , 2- l G\) /(2-*P 2 , 2-'G 2 )]. 

Lemma [5] establishes a coupling such that P/ = G\ = B\ with probability tending 
to one in the limit under our hypotheses. Thus, under our conditions on t = t n the 
expectation (O, and hence E[(/ / di] t , n ) 2 ], converges to 

lim E[f(W})f(Wi)] = Urn E[f(W},W})]E[f(W?,Wi)] 

t — >oo t— *oo 

f(x,x)R(dx)] 



fdn 



A similar but simpler argument shows that E[J f dr)t jn ] converges to J f drj. Com- 
bining these two facts shows that Var[J / d-q t , n ] converges to zero. 

Therefore, J f dr\t,n converges in probability to J f drj for all bounded continuous 
functions /, as required. □ 

Proof of Proposition [H It suffices by Lemma [4] to show that 



n(b — a) 



*{ 



an<k <bn: 2~ J P 



tpF(t,k) 



[2/3(6), 20(a)]] 



converges to 1 in probability as t = t n and n go to infinity in such a way that 
2 2t "/n 0. 
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For 7 > and an integer 1 < k < n, 

k 



r?t,«(K+ x [7,00)) > 



& #{1 < i < n : 2~*Gj > 7} > k 
# 2- t G? {t > k) > 7, 

by definition of the empirical distribution rj t . n and the indices F(t, k). Because the 
limit measure rj assigns zero mass to the boundary R + x {7} of the set K+ x [7, 00), 
it follows from Lemma [6] that 

-•#{!<«<«: 2 _t Gj > 7} 
n 

converges to 7?(1L|_ x [7,00)) = -RQ7, 00)) in probability. In particular, 

- • #{1 < i < n : 2~ t G\ > 2(3(c)} 

converges to c in probability for < c < 1 — i?({0}). Thus, 2~*Gf ^'L c ™^ converges 
in probability to 2/3(c) for such a c. 

With 0<a<6<l — R({0}) as in the statement of the proposition, it follows 
that 

- • # [an <k<bn: 2"*Gf (t,fe) £ [2/3(6 - e), 2(3(a + e)]| 

converges in probability to (b — a — 2e) for < e < (b — a)/2. 
Note by Lemma |6] that 



-•#{l<fc<n: 2 -*Pf (t ' fe) - 2-*Gf (t ' fe) 
n L 

= %,n({(x,y) : |as-y| ><5}) 



><5 



converges in probability to for any 5 > 0, because the probability measure 77 
assigns all of its mass to the diagonal {(x, y) 6 : x = y}. 
Taking 6 < 2 min{/3(a) - 0(a + e),/3(b - e) - /3(6)} so that 

[2/3(6 - e) - 8, 2/3(a + e) + 5} C [2/3(6), 2/3(a)], 

letting n tend to infinity, and then sending e to zero completes the proof. 

□ 



As an application of this proposition, one might wonder about the number of 
descendant alleles of those individuals with many genealogical descendants. It is 
imaginable that the number of descendant alleles of each individual would stay 
bounded; however, this is not the case. 

Corollary 1. Fix y > 0, and suppose t = t n satisfies \\m n ^, oa 2 2tn /n — > 0. With 
probability tending to one as n goes to infinity, there will be an individual i in the 
population at time such that E [Q|| ©t] > y. 

Proof. Because the support of the probability distribution R is all of K+ , the func- 
tion (3 is unbounded. The result is then immediate from Proposition [5] . □ 
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Figure 4. The dependence of the expected number of MRCAs on 
population size. Average of 10000 simulations. 

4. The number of MRCAs and the number of descendant alleles per 



There are a number of other interesting phenomena that seem more difficult 
to investigate analytically but are interesting enough to deserve mention. For the 
simulations of this section (and the one mentioned in the introduction) we wrote a 
series of simple ocaml programs which are available upon request. 

As mentioned in the introduction, it is not uncommon to get several genealogical 
MRCAs simultaneously. We denote the (random) time to achieve a genealogical 
MRCA for a population of size n by T n . We denote the (random) number of 
genealogical MRCAs for a population of size n by M n . The surprising dependence 
of E[M n ] on n is shown in Figure HI 

However, the situation becomes clear by investigating the conditional expectation 
E[M„|T„] as shown in Figure [5] According to the law of total expectation, one can 
gain the expectation by taking the sum of conditional expectations weighted by 
their probability. In this setting, 



First note in Figured] (a) that E[M„ \ T n = k] appears to be a decreasing function of 
n when k is fixed. This is not too surprising: imagine that we are doing simulations 
with n individuals, but only looking at the results of simulations such that T n = k. 
When n gets large, simulations such that T n = k are ones which take an unusually 
short time to reach T. It's not surprising to find that the number of MRCAs would 
be small in this case. Conversely, simulations such that T n is significantly bigger 
than log 2 n are ones that take an unusually long time; it is not surprising that such 
simulations have a larger number of MRCAs as they have more individuals "ready" 
to become MRCAs just before T n . This argument is bolstered by Figure [5] which 
shows that simulations resulting in different T n 's have remarkably similar behavior. 
Specifically, the distribution of the number of genealogical descendants sorted by 
rank does not show a very strong dependence on the time to most recent common 
ancestor T n . Therefore simulations for a given population size that have a smaller 



MRCA 
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(a) 



(b) 



T n = 3 
T„= 4 
T„= 5 
T„= 6 
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60 80 100 120 140 160 



Figure 5. The number of MRCAs where the dependence on T n 
(the time to MRCA) is taken into account. Average of 10000 simu- 
lations. Figure (a) shows the number of MRCAs at T n conditioned 
on T n . Figure (b) shows the dependence of the distribution of times 
to MRCA on population size. As described in the text, it is the 
combination of these two distributions using the law of total expec- 
tation that produces the "bumps" of Figure 2] Note that several 
simulations with "extreme" values of T have been eliminated from 
(a) for clarity; these combinations of T„ and population size are 
rare and thus we would not get an accurate estimate of the expec- 
tation. 



T n have fewer individuals who are close to being MRCAs while individuals with 
larger T n have more. 

Second, note in Figure [5] (b) that the distribution of T n has bumps such that (at 
least for integers k > 3), there is an interval of n such that P{T„ = k] is large in 
that interval. In such an interval we are approximately on a single line of Figure [5] 
(a), that is, E[M„] is approximately E[M„ \ T n = « n ] where K n is the most likely 
value of T n . This value is decreasing as described in the previous paragraph; thus 
we should see a dip in E[M n ]. Indeed, from the plots of Figures [4] and [5] (b) it 
can be seen that the dips in the number of MRCAs correspond to the peaks of the 
probability of a given T n . 

Now we return to the genetic story considered in the rest of the paper. The above 
considerations certainly apply when formalizing questions such as "how genetically 
related is the MRCA to individuals of the present day?" Clearly, there will often 
not be only one MRCA but a number of them. Furthermore, the dynamics of the 
numbers of MRCAs plays an important part in the answer to the question. 

In Figure [7] we show the number of alleles descended from the union of the 
MRCAs as a function of n. This shows oscillatory behavior as in Figure |U however 
the effect is modulated by the results shown in Figure [5J Specifically, although 
the number of MRCAs decreases with n conditioned on a value of T n , the number 
of descendant alleles per MRCA is actually increasing. The combination of these 
two functions appears to still be a decreasing function, which creates the "dips" in 
Figure [7] 
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Figure 6. A plot of E 



G 



F(t,k) 



through time conditioning 



on T n . Each curve for a given choice of T n represents the expected 
state of the process at a given time. That is, each curve represents 



the image of the map k 



E 



QF(t,k) 



T 



for some choice of t 



and T n . As described in the text, the curves show surprisingly 
little dependence on T n , rather depending almost exclusively on t. 
Average of 10000 simulations with n = 200. 




Figure 7. The number of alleles descended from the union of 
the MRCAs versus population size. This plot shows oscillatory 
behavior similar to that in Figure [4] but the effect is dampened by 
the fact that the average number of alleles descended from each 
MRCA increases with n as shown in Figure [8] Average of 10000 
simulations. Some simulations with "extreme" values of T were 
excluded for clarity as in Figure [5] (a) . 



The apparent fact that, while fixing T n , the average number of alleles descended 
from each MRCA appears to increase with n deserves some explanation. As demon- 
strated in Lemma 01 the expected number of descendant alleles of an individual is 
a multiple of the number of paths to present-day ancestors in the genealogy that 
individual. Therefore, the fact needing explanation is the apparent increase in the 
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Figure 8. The average number of alleles descended from each 
MRCA conditioned on T. 

number of paths as n increases. This can be explained in a way similar to that for 
the conditioned number of MRCAs. Let us again fix T n — k and vary n. When 
n gets large, simulations which the required value of T n have found a common an- 
cestor quite quickly. In these cases the "ancient" endpoints of the paths should be 
tightly focused in the most recent common ancestors. On the other hand, for small 
n the simulations have reached T„ relatively slowly so the distribution of paths is 
more diffuse. 



We have investigated the connection between genetic ancestry and gene alogical 
ances try in a natural genetic model extending the genealogical model of Chanel 
1999]. We have shown that an increased number of genealogical descendants im- 
plies a super-linear increase in the number of descendant alleles. We have tracked 
how the number of genetic descendants depends on the number of genealogical de- 
scendants through time and shown that it acquires an understandable shape for a 
period of time before T n (the time of the genealogical MRCA) . We have also inves- 
tigated the number of MRCAs at T n , and the number of alleles descending from 
the MRCAs, and explained their surprising oscillatory dependence on population 
size using simulations. 
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